Space-efficient data structures for Top-k completion

نویسندگان

  • Bo-June Paul Hsu
  • Giuseppe Ottaviano
چکیده

Virtually every modern search application, either desktop, web, or mobile, features some kind of query auto-completion. In its basic form, the problem consists in retrieving from a string set a small number of completions, i.e. strings beginning with a given prefix, that have the highest scores according to some static ranking. In this paper, we focus on the case where the string set is so large that compression is needed to fit the data structure in memory. This is a compelling case for web search engines and social networks, where it is necessary to index hundreds of millions of distinct queries to guarantee a reasonable coverage; and for mobile devices, where the amount of memory is limited. We present three different trie-based data structures to address this problem, each one with different space/time/ complexity trade-offs. Experiments on large-scale datasets show that it is possible to compress the string sets, including the scores, down to spaces competitive with the gzip’ed data, while supporting efficient retrieval of completions at about a microsecond per completion.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

K 2-Treaps: Range Top-k Queries in Compact Space

Efficient processing of top-k queries on multidimensional grids is a common requirement in information retrieval and data mining, for example in OLAP cubes. We introduce a data structure, the K-treap, that represents grids in compact form and supports efficient prioritized range queries. We compare the K-treap with state-of-the-art solutions on synthetic and real-world datasets, showing that it...

متن کامل

Space-efficient Data Structures for String Searching and Retrieval

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 The Models of Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Our Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...

متن کامل

Top-k String Auto-Completion with Synonyms

Auto-completion is one of the most prominent features of modern information systems. The existing solutions of auto-completion provide the suggestions based on the beginning of the currently input character sequence (i.e. prefix). However, in many real applications, one entity often has synonyms or abbreviations. For example, “DBMS” is an abbreviation of “Database Management Systems”. In this p...

متن کامل

Colored Range Queries and Document Retrieval

Colored range queries are a well-studied topic in computational geometry and database research that, in the past decade, have found exciting applications in information retrieval. In this paper we give improved time and space bounds for three important one-dimensional colored range queries — colored range listing, colored range top-k queries and colored range counting — and, thus, new bounds fo...

متن کامل

Space-Efficient Top-k Document Retrieval

Supporting top-k document retrieval queries on general text databases, that is, finding the k documents where a given pattern occurs most frequently, has become a topic of interest with practical applications. While the problem has been solved in optimal time and linear space, the actual space usage is a serious concern. In this paper we study various reduced-space structures that support top-k...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013